Top_Keyword: An Aggregation Function for Textual Document OLAP
نویسندگان
چکیده
For more than a decade, researches on OLAP and multidimensional databases have generated methodologies, tools and resource management systems for the analysis of numeric data. With the growing availability of digital documents, there is a need for incorporating text-rich documents within multidimensional databases as well as an adapted framework for their analysis. This paper presents a new aggregation function that aggregates textual data in an OLAP environment. The TOP_KEYWORD function (TOP_KW for short) represents a set of documents by their most significant terms using a weighing function from information retrieval: tf.idf.
منابع مشابه
Olap aggregation function for textual data warehouse
For more than a decade, OLAP and multidimensional analysis have generated methodologies, tools and resource management systems for the analysis of numeric data. With the growing availability of semistructured data there is a need for incorporating text-rich document data in a data warehouse and providing adapted multidimensional analysis. This paper presents a new aggregation function for keywo...
متن کاملOLAP textual aggregation approach using the Google similarity distance
Data warehousing and On-Line Analytical Processing (OLAP) are essential elements to decision support. In the case of textual data, decision support requires new tools, mainly textual aggregation functions, for better and faster high level analysis and decision making. Such tools will provide textual measures to users who wish to analyse documents online. In this paper, we propose a new aggregat...
متن کاملMultidimensional Anlaysis of XML Document Contents with OLAP Dimensions
With the emergence of Semi-structured data format (such as XML), the storage of documents in centralised facilities appeared as a natural adaptation of data warehousing technology. Nowadays, OLAP (On-Line Analytical Processing) systems face growing non-numeric data. This chapter presents a framework for the multidimensional analysis of textual data in an OLAP sense. Document structure, metadata...
متن کاملA Formal Framework of Aggregation for the OLAP-OLTP Model
OLAP applications are widely used in business applications. They are often (implicitly) defined on top of OLTP systems and extensively use aggregation and transformation functions. The main OLAP data structure is a multidimensional table with three kinds of attributes: so-called dimension attributes, implicit attributes given by aggregation functions and fact attributes. Domains of dimension at...
متن کاملMeta-Stars: Dynamic, Schemaless, and Semantically-Rich Topic Hierarchies in Social BI
A key role in OLAP analyses of textual user-generated content for social business intelligence (SBI) is played by topics, i.e., concepts of interest within a subject area. Topic hierarchies are irregular, heterogeneous, dynamic, and possibly schemaless; besides, unlike in traditional OLAP, di↵erent semantics for topic aggregation can be envisioned. In this demonstration we present an architectu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008